Towards Multimodal Spoken Language Corpora: TransTool And SyncTool
نویسندگان
چکیده
This paper argues for the usefulness of multimodal spoken language corpora and specifies components of a platform for the creation, maintenance and exploitation of such corpora. Two of the components, which have already been implemented as prototypes, are described in more detail: TransTool and SyncTool. TransTool is a transcription editor meant to facilitate and partially automate the task of a human transcriber, while SyncTool is a tool for aligning the resulting transcriptions with a digitized audio and video recording in order to allow synchronized presentation of different representations (e.g., text, audio, video, acoustic analysis). Finally, a brief comparison is made between these tools and other programs developed for similar purposes. 1. I n t r o d u c t i o n The availability of adequate tools for the creation, maintenance and use of multimodal spoken language corpora is an important instrumental goal for spoken language research, whether this research is motivated primarily by the desire to gain a better understanding of the mechanisms of spoken communication or by the wish to develop practical applications such as multimodal interfaces for humanmachine interaction. Multimodal dialog systems will be a feature of many future applications, e.g., information systems. They will also be a feature of many VR systems and tutoring systems. The basic source of inspiration for dialog systems is ordinary human face-to-face communication involving both speech and gestures. However, our understanding of human communication as a multimodal phenomenon is still very insufficient. Thus, there is a need for tools which will enable us to gain a better understanding of the relations between properties of human face-to-face communication, such as gestures, intonation, words and 11 grammar, and of how the utterances and gestures of different speakers are coordinated with each other. In this paper, we report on a long-term project to develop a platform for multimodal spoken language corpora. More specifically, we describe two modules of such a platform, which both exist in prototype implementations. The first of these modules, which is called TransTool, is a transcription editor which assists a human transcriber in producing transcriptions in accordance with a given standard and partially automates some of the tasks involved, e.g., in the marking of overlapping speech. The second one, SyncTool, is a tool for aligning transcriptions with the corresponding digitized audio and video recordings in order to allow synchronized display of different representations. Again, this is meant to provide support for a human analyst rather than to provide a completely automated process, although the latter would of course be preferable in the long run. Before we turn to a detailed description of TransTool and SyncTool, however, we will try to set the stage by presenting the platform for multimodal spoken language corpora of which these tools are meant to be part.
منابع مشابه
Designing a Multimodal Spoken Component of the Australian National Corpus
Spoken language and interaction lie at the core of human experience. The primary medium of communication is speech, with some estimating the ratio of spoken-written language to be as high as 90%-10% (Cermák, 2009, p. 115). Yet they have remained poor cousins in the building of corpora to date. Not only are spoken corpora much smaller than written corpora (Xiao, 2008), the overwhelming focus in ...
متن کاملA Portuguese spoken and multi
This paper presents an overview of the spoken and multimodal dialog Portuguese corpora collected in the context of the FASiL (Flexible and Adaptive Spoken Language and Multi-Modal Interfaces) project. The project developed a Virtual Personal Assistant application in the Personal Information Management domain, exploiting the state-of-theart of speech and multi-modal technology. The FASiL corpora...
متن کاملFrom multilingual multimodal spoken language acquisition towards on-line assistance to intermittent human interpreting: SIM*, a versatile environment for SLP
We present and discuss SIM*, a versatile multiplatform simulation environment for Speech MT. Based on a Wizard of Oz scheme, it is firstly intended for supporting and collecting multimodal bilingual spontaneous spoken dialogues through the Internet, in order to later build annotated multimodal multilingual speech corpora, on taskoriented subdomains. Current prototyping investigates a symmetrica...
متن کاملThe NITE XML Toolkit: Demonstration from five corpora
The NITE XML Toolkit (NXT) is open source software for working with multimodal, spoken, or text language corpora. It is specifically designed to support the tasks of human annotators and analysts of heavily cross-annotated data sets, and has been used successfully on a range of projects with varying needs. In this text to accompany a demonstration, we describe NXT along with four uses on differ...
متن کاملWork on Spoken (Multimodal) Language Corpora in South Africa
This paper describes past, ongoing and planned work on the collection and transcription of spoken language samples for all the South African official languages and as part of this the training of researchers in corpus linguistic research skills. More specifically the work has involved (and still involves) establishing an international corpus linguistic network linked to a network hub at a UNISA...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998